A Multi Join Algorithm Utilizing Double Indices

نویسندگان

  • Hanan A. Mahmoud
  • Lilac A. E. Al-Safadi
چکیده

Join has always been one of the most expensive queries to carry out in terms of the amount of time to process. This paper introduces a novel multi join algorithm to join multiple relations. The novel algorithm is based on a hashed-based join of two relations to produce a double index. This is done by scanning the two relations once. Instead of moving the records into buckets, a double index is built. This will eliminate collision as a result of a complete hash algorithm. The double index will be divided into join buckets of similar categories from the two relations. Buckets with similar keys are joined to produce joined buckets. This will lead at the end to a complete join index of the two relations without actually joining the actual relations. The time complexity required to build the join index of two categories is O(m log m) where m is the size of each category. The proposed algorithm has a time complexity of O (n log m) for all buckets where n is the number of buckets. The join index will be used to materialize the joined relation if required. Otherwise, along with other join indices of other relations, the join index builds a lattice to be used in multi-join operations with minimal I/O requirements. The lattice of the join indices can be fitted into the main memory to reduce time complexity of the multi join algorithm.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Efficient Multi Join Algorithm Utilizing a Lattice of Double Indices

In this paper, a novel multi join algorithm to join multiple relations will be introduced. The novel algorithm is based on a hashed-based join algorithm of two relations to produce a double index. This is done by scanning the two relations once. But instead of moving the records into buckets, a double index will be built. This will eliminate the collision that can happen from a complete hash al...

متن کامل

A Comparison of Indexed Temporal Joins

We examine temporal joins in the presence of indexing schemes. Utilizing an index when processing join queries is especially advantageous if the join predicates involve only a portion of the temporal relations. This is a novel problem since temporal indices have various characteristics that can affect join processing drastically. For example, temporal indices commonly introduce record copies to...

متن کامل

A Data Mining Approach for selecting Bitmap Join Indices

Index selection is one of the most important decisions to take in the physical design of relational data warehouses. Indices reduce significantly the cost of processing complex OLAP queries, but require storage cost and induce maintenance overhead. Two main types of indices are available: mono-attribute indices (e.g., B-tree, bitmap, hash, etc.) and multi-attribute indices (join indices, bitmap...

متن کامل

Rank Join Queries in NoSQL Databases

Rank (i.e., top-k) join queries play a key role in modern analytics tasks. However, despite their importance and unlike centralized settings, they have been completely overlooked in cloud NoSQL settings. We attempt to fill this gap: We contribute a suite of solutions and study their performance comprehensively. Baseline solutions are offered using SQLlike languages (like Hive and Pig), based on...

متن کامل

Index Based Processing of Semi-Restrictive Temporal Joins

Temporal joins are important but very costly operations. While a temporal join can involve the whole time (and/or key) domain, we consider the more general case where the join is defined by some time-key rectangle from the whole space (i.e., when the user is interested in joining portions of the –usually large– temporal data). In the most restrictive join, objects (within this rectangle) are jo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • JCIT

دوره 4  شماره 

صفحات  -

تاریخ انتشار 2009